Sentiment Analysis on The Repeal of 377A in Singapore

Version 1: 4 January 2023

Introduction

On 29 November 2022, the Singapore Parliament voted to repeal a law criminalising gay sex, known as Section 377A of the Penal Code in a historic move. The law, introduced under British colonial rule in 1938, had been in force for over 80 years. Its repeal was widely celebrated by activists and members of the LGBTQ+ community, who had been calling for its repeal since as early as 2007 in Singapore (Channel News Asia, 2022b). On the other hand, some conservative members of society, such as religious groups, have expressed disappointment.

At the same time as the repeal, however, the Singapore Parliament also amended the Constitution to introduce a new Institution of Marriage clause in Singapore’s constitution. The constitutional amendment officially protects the definition of marriage as between a man and woman from legal challenges in Singapore. While gay sex is no longer criminal, it seems that Singapore society is still not ready to accept marriages that are not between a man and a woman.

Hence, we see that there are both progressive and conservative elements in Singapore society surounding the issue of 377A and opinions seem to be polarised. This project aims to understand public sentiments surrounding the recent repeal of 377A and degree of support for or opposition to it. It primarily uses data from Twitter and Reddit. Twitter is an open social network which individuals use to converse with one another in 'tweets', or short messages. Reddit is a social news aggregation and discussion website. I have chosen Twitter and Reddit as they are two social media platforms which Singaporeans use regularly to discuss and debate current affairs, particularly youths, and their API is easily accessible for scraping data.

The flow of my project and the main packages used will be as follows:

Screenshot%202023-01-04%20at%2010.58.40%20AM.png

Scope

Data Description

In this project, I use the following data: 1) google_377a.csv: A csv file containing data on Google trends for the keyword '377A'

2) tweets: A Pandas DataFrame containing tweets scraped from Twitter with the keyword '377A'. Scraped using the Python library Tweepy.

3) Reddit data, scraped using the package subreddit-comments-dl from https://github.com/pistocop/subreddit-comments-dl

Research Question

Main question: Is the overall public sentiment towards the repeal of 377A positive or negative?

Further questions:

Time Period of Data

To determine the period over which we should scrape data, I use Google web searches as a proxy to track the interest in the topic 377A over time. Using Google Trends, I downloaded data of the Google web searches performed in Singapore in 2021 for the keyword '377A'.

The data can be found here: https://trends.google.com/trends/explore?geo=SG&q=377A.

I insert vertical lines to mark two significant dates on the graph:

We can see a sudden spike in interest in 377A from 21 August. This is the first time Prime Minister Lee announced that the government would repeal 377A.

Hence, I choose to scrap data from 21 August to the end of the year, 31 December.

Data Scraping and Cleaning

Scraping Tweets

Note: Due to limitations with Twitter's Developer Account (Elevated Access), I am currently only able to scrap tweets from the last 30 days in the develop sandbox. Hence, I have scrapped tweets from the period 1 December 2022 to 31 December 2022. In future versions of this project, I will seek to scrap tweets over the full period 21 August 2022 to 31 December 2022.

Cleaning Tweets

We get a DataFrame of tweets on 377A in the month of December 2022, len=296.

Scraping Reddit Comments

Using the package subreddit-comments-dl (https://github.com/pistocop/subreddit-comments-dl) and my reddit API login keys, I scraped comments from two Reddit threads which discuss current events in Singapore — r/Singapore and r/SingaporeRaw.

Cleaning Reddit Comments

We get a DataFrame of Reddit comments on 377A, len=497.

Sentiment Analysis

After cleaning up the data, I perform sentiment analysis to generate a quantitative meaasure of the sentiment. This would allow me to answer my research questions. I use three NLP packages - TextBlob, NRCLex and nltk.sentiment.vader with different functions for this.

Choice of NLP Packages

Firstly, I use TextBlob to detect subjectivity and polarity. Subjectivity quantitatively measures the degree of personal feeling and factual information in the text (Liu, 2012). The subjectivity value falls within the range [0, 1]. The higher the subjectivity, the more personal bias the text contains. Polarity quantitatively measures the level of positivity or negativity in the text. The polarity value falls within the range [-1, 1], with -1 indicating a negative sentiment and 1 indicating a positive sentiment (ibid). [-1,1]. In our case, it is useful for providing a general overview of whether people feel positively or negatively about the repeal of 377A and how much they care about the issue.

Secondly, I use nltk.sentiment.vader to quantitatively measure the overall positivity or negativity of the sentiments. VADER, which stands for Valence Aware Dictionary for sEntiment Reasoning, is a rule-based model by Hutto and Gilbert (2014). Although TextBlob does offer a measure of positivity and negativity in its 'polarity' function, I choose to use VADER as well as VADER has been shown to provide results of high accuracy, even outperforming human raters (ibid). VADER is also able to detect neutral text. Using VADER, we are able to more accurately discern the level of positivity or negativity in online sentiments towards the repeal of 377A.

Thirdly, I use NRCLex to detect emotions. The NRCLex package quantitatively measures the emotional affects of a given text (Bailey, 2019). It draws on the National Research Council Canada (NRC) affect lexicon and the NLTK library’s WordNet synonym sets (ibid). The emotional effects it measures are fear, anger, anticipation, trust, surprise, positive, negative, sadness, disgust and joy. It is useful in helping to identify the specific emotions that people feel towards the repeal of 377A, rather than just 'positive' or 'negative'.

Generating Sentiment Analysis

Visualisations and Findings

Polarity and Subjectivity

For both the Twitter and Reddit data, I plot a graph of subjectivity (x-axis) against polarity (y-axis).

Twitter

Reddit

While the Reddit graph is denser, which is to be expected as our data size for Reddit (497) is larger than the size of our data for Twittter (296), the graphs are similar in shape. We observe a roughly V-shaped graph, which is consistent with the literature on sentiment analysis. This is because the higher the level of subjectivity, the more personal opinion it contains, thus the more likely it is strongly positive or strongly negative in polarity.

From both graphs, we also see that more dots lie on the right of the vertical line in the centre, where polarity = 0. This indicates that sentiments on the repeal of 377A are more positive than negative. There does not seem to be a significant difference in terms of overall sentiment on Twitter versus Reddit.

In the graph for Reddit comments, there are a few dots which lie exactly on the vertical line in the centre with polarity = 0 and subjectivity within [0, 0.2]. This indicates that these comments are quite neutral and factual. These comments could be excerpts from news articles, such as Reddit bots that automatically scrape and post news articles.

VADER

Using the vader_compound value of each text, I plot a boxplot. The VADER compound value is the sum of the VADER positive, VADER negative and VADER neutral scores, normalised to fall within the range [-1, 1] (Patil et al, 2019). The compound score reveals the polarity of the text. The nearer the compound score is to 1, the more positive it is; the nearer the compound score is to -1, the more negative it is.

Twitter

Reddit

Unlike TextBlob's analysis, the VADER analysis reveals clear differences between the polarity of sentiments on Twitter versus on Reddit. The mean for the VADER compound score of Tweets is 0.0944 and the median is 0. This indicates that on average, sentiments relating to 377A on Twitter are neutral. The mean for the VADER compound score of Reddit comments is 0.176, while the median is 0.237. Both values are low positive values, indicating that Reddit comments relating to 377A are overall slightly more positive.

Notably, the lower and upper fences of Twitter scores are -0.572 and 0.944 respectively. The same values for Reddit are -0.997 and 0.998, which are very close to -1 and 1 respectively. This indicates that there is a broader spectrum of opinions on Reddit than on Twitter, such as more strongly positive and more strongly negative ones.

Emotions Analysis

Lastly, I plot a scatter figure of the emotional affect values generated by NRCLex for 10 emotions. This is plotted against time. I also plot a regression line for each of the emotions to estimate the trend in emotions over time. This is useful in helping us identify changes in feelings across time, if any, such as from the announcement of the repeal in late August and the announcement that 377A had been successfully repealed in late November. However, this is more useful for Reddit data, as my Twitter data only spans 1-31 December.

Twitter

Most of the emotional affects have values close to 0, indicating that they are not significant. I have picked out a few significant emotions to analyse.

Notable Emotions

Firstly, there was a high level of sadness compared to the other emotions at around the start of December, which was right after 377A was repealed on 29 November. However, this was brief and the level of sadness detected in tweets on 377A declined as the month went on. Notably, we see that there are several 'very sad' tweets right after 4 December with a perfect sadness value of 1. These outliers are responsible for bringing up the average level of 'sadness' detected in the tweets. Most of the tweets are actually below the value of 0.5 for sadness. Hence, apart from a few outliers, the 'sadness' level on average is not that high.

In contrast, the regression line for 'joy' is quite close to the x-axis, with the largest value on the line being y=0.0579. This indicates that overall levels of 'joy' are not very high. At the start of December, the 'joy' level measured in most of the tweets fall within the range [0, 0.3], indicating low levels of joy. This shows that Twitter users were not very joyful about the repeal of 377A.

Figure 1. Graph of sadness measured in tweets on 377A

Figure 2. Graph of joy measured in tweets on 377A

Finally, the regression line for 'positive' emotional affect sits higher above the x-axis than the line for 'negative' emotional affect, although absolute values measured are still relatively low. This confirms our earlier analysis that Twitter sentiments towards the repeal of 377A are more positive than negative.

Figure 3. Graph of positivity and negativity measured in tweets on 377A

Reddit

Most of the data points are clustered in August 21-September 5, as well as November 27-November 30. This means that there was a spike in discussions of 377A during these periods. This finding is consistent with the periods of high numbers of Google searches for '377A', as from the Google Trends data earlier, and is due to the Prime Minister's announcement of the repeal on 21 August and the actual repealing on 29 November.

The data also shows points concentrated around the October 20-23 period, indicating greater discussion of 377A during those days on Reddit. This is because on October 20, the bill to repeal 377A was tabled in Parliament while the constitutional amendment to the definition of marriage was introduced (Channel News Asia, 2022a). Although this news did not result in a significant interest in Google searches for 377A, it did spark discussion on Reddit.

Turning our attention to the various emotional effects, we see that the regression lines have gentle gradients and are almost horizontal. This means that the degree of the various emotional effects are quite consistent across the period and there are little fluctuations. Some emotional effects, such as 'surprise' and 'disgust' are very close to the x-axis, showing that there is little 'surprise' and 'disgust' detected in Reddit comments on 377A.

Notable Emotions

The emotional affect with the second highest values, based on its linear regression line, is 'trust'. Although it is not clear who the 'trust' is placed in, it is plausible that this represents the people's trust in the government on the issue of the repealing of 377A. Similarly, 'trust' also measures relatively high values compared to other emotional affets in the emotional analysis graph for Twitter.

Figure 4. Graph of trust measured in Reddit comments on 377A

The level of 'fear' appears to be quite low, but is not insignificant. All measures of fear lie below the value of 0.5 but many lie within the range [0, 0.3], showing that many Reddit commentators express a small degree of fear regarding 377A. Notably, unlike other emotional affects, there are no outliers of fear value = 1 for the 'fear' emotional affect.

Figure 5. Graph of fear measured in Reddit comments on 377A

Based on the linear regression line, the affect with the highest values is 'positive', while that with the third highest value is 'negative'. As with the Twitter emotional effect graph, the positive values appear higher than the negative values on the whole. This corroborates with our earlier analysis that the repeal of 377A was received with more positive than negative sentiments.

Figure 6. Graph of positivity and negativity measured in tweets on 377A

Word Cloud

Finally, using the package WordCloud, we can plot a word cloud using the keywords generated earlier to explore any issues or commonly-held opinions surrounding the repeal of 377A using . A word cloud provides a visual representation of the words used in the text, with words that appear more commonly appearing bigger. This is also useful in enabling us to compare the similarities and differences between the debate around the repeal of 377A on Twitter versus Reddit.

Twitter

Reddit

Observing both word clouds, we see that some keywords are common, such as 'Singapore', 'repeal', 'gay' and 'LGBT'. These are expected as they are keywords relating to the 377A law.

In the Twitter word cloud, we see that the words 'turn', 'page', 'dark' and 'history' are very prominent. This is likely because many Twitter users were tweeting, quote tweeting or retweeting the BBC Article, '377A repeal: Singapore turns page on dark LGBT history' (link: https://www.bbc.co.uk/news/world-asia-63832825). It is interesting to observe that a news article from a British news source on a law repeal in Singapore is the most mentioned. This could suggest that many of the Twitter users tweeting about 377A are actually international, and not Singaporean citizens, who tend to read news on Singapore from local news outlets, such as The Straits Times or CNA.

In the Reddit word cloud, we see that the words 'marriage' and 'law' are very large. This suggests that Reddit users were commenting on the government's constitutional amendment to the definition of marriage in late November, although their opinion on the issue is not clear from the word cloud. In contrast, these words are less prominent in the Twitter word cloud.

Conclusion

1. Is the overall public sentiment towards the repeal of 377A positive or negative?

Based on the data from all three packages, we see that the overall public sentiment towards the repeal of 377A is more positive than negative. However, this positive opinion is not strong.

Upon further research, this finding is consistent with the findings of surveys conducted in Singapore. In a survey in end August, Blackbox Research found that 43% of Singaporean adults expressed support for the repeal, while 21% opposed the repeal (The Straits Times, 2022), demonstrating that more held positive sentiments than negative ones towards the repeal. 34% remained neutral and 2% did not state their stand.

Although the constitutional amendment was discussed, particularly on Reddit, the overall sentiment towards the repeal of 377A appears to still be positive.

2. How polarised are opinions towards the repeal within Singapore society?

Unlike mainstream media narratives of conservative religious groups being at loggerheads with liberal LGBTQ+ activists, opinion within society on the issue of 377A is not very polarised. Most people express slightly positive or roughly neutral sentiments, rather than strong positive or negative sentiments.

3. What specific emotions do people feel about the repeal?

Reading the data from the emotional affects graphs, apart from the 'positive' and 'negative' affects, the emotional affects 'trust' and 'anticipation' rank highly compared to other emotions. This could suggest that the people trust the government and anticipated the repeal of 377A or more progressive change in society. However, these values are not high, indicating weak rather than strong emotions.

4. How have sentiments towards the repeal changed over time?

Observing the linear regression lines in the Reddit emotional analysis graph, we see that they have gentle gradients. This shows that there has actually been little change in emotions or sentiments over time, from the announcement of the government's intention to repeal to the actual repealing.

Limitations and Areas for Improvement

Research Scope and Methodology

Firstly, the Twitter dataset is not satisfactory as it only covers the period 1-31 December. Ideally, we should scrap Twitter data from 21 August to 31 December. This would also provide a better basis for comparison against Reddit data and allow us to track changes in sentiments in tweets across time. Unfortunately, Academic Access to the Twitter APIs is required for conducting a full_archive_search with tweepy, which I am currently unable to acccess.

Secondly, it is hard to confirm whether commentators are Singaporean or not. For Reddit, it is likely that most users who post on the r/Singapore and r/Singapore are Singaporeans as it is a niche only community. However, it is hard to confirm whether Twitter users posting about '377A' are Singaporeans or not as they are not tweeting within a specific community. Given the longstanding nature of 377A and the conservative nature of Asian society, the Singapore's government's decision to repeal it attracted the attention of and was lauded by many international onlookers, particularly in other Asian countries. The fact that the BBC article, rather than an article from a local news outlet, was often mentioned in tweets could suggest that these Twitter users were not from Singapore. This limits the effectiveness of this project in analysing Singaporeans' sentiments towards the repeal.

Thirdly, to gain a more complete picture of public sentiments, we can scrap data from other social media platforms. Twitter and Reddit may not be representative of the sentiments of the population, particularly that of middle-aged adults who may prefer to use Facebook, or younger teenagers who prefer to use Instagram and TikTok. However, it would similarly be difficult to confirm whether commentators are Singaporean or not.

Data Science

Firstly, the project was unable to capture sentiments expressed in other languages. As Singapore is a multi-ethnic society with four main ethnic groups, it is likely that some opinions on social media on Singapore may be expressed in other languages, such as Malay or Mandarin. Looking through the Twitter and Reddit data briefly, we see phrases such as '彩虹沙河' in Mandarin, which translates to 'Rainbow Sand River', likely indicating the user's support for LGBTQ+ rights. Such sentiments are not captured by this project.

Secondly, for more accurate analysis of keywords or if further analysis of keywords is required, stemming and lemmatisation can be performed using nltk packages. I did not stem or lemmatise the keywords in this project as my focus was not on keywords analysis and I did not want to run the risk of incorrectly stemming or lemmatising a keyword.

References

Bailey, M. M. (Ed.). (2019). NRCLex: An affect generator based on TextBlob and the NRC affect lexicon. PyPI. https://pypi.org/project/NRCLex/

Channel News Asia. (2022a, October 20). Bills to repeal 377A, amend Constitution to protect definition of marriage tabled in Parliament. CNA. https://www.channelnewsasia.com/singapore/repeal-377a-constitution-amendments-protect-marriage-parliament-bill-first-reading-3016736?cid=FBcna

Channel News Asia. (2022b, November 28). Timeline: Repealing Section 377A and amending the Constitution to protect the definition of marriage. CNA. https://www.channelnewsasia.com/singapore/timeline-repealing-377a-gay-sex-law-amending-singapore-constitution-marriage-definition-3101551

Hutto, C., & Gilbert, E. (2014). VADER: A Parsimonious Rule-Based Model for Sentiment Analysis of Social Media Text. Proceedings of the International AAAI Conference on Web and Social Media, 8(1), 216–225. https://ojs.aaai.org/index.php/ICWSM/article/view/14550/14399

Liu, B. (2012). Sentiment Analysis and Opinion Mining. https://www.cs.uic.edu/~liub/FBS/SentimentAnalysis-and-OpinionMining.pdf

Patil, A., R, A., Rayar, S., & K M, V. (2019). Comparison of VADER and LSTM for Sentiment Analysis. International Journal of Recent Technology and Engineering (IJRTE), 7(6S).

The Straits Times. (2022, August 25). Section 377A: Religious groups call for unity; poll finds 43% support repeal, double those against. The Straits Times. www.straitstimes.com. https://www.straitstimes.com/singapore/politics/section-377a-religious-groups-call-for-unity-poll-finds-43-per-cent-support-repeal-double-those-against